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Abstract. Deflation techniques for Krylov subspace methods and in particular the conjugate 
gradient method have seen a lot of attention in recent years. They provide means to improve the 
convergence speed of the methods in a rather straight forward way by enriching the Krylov subspace 
with a deflation subspace. The most common approach for the construction of deflation subspaccs 
is to use (approximate) eigenvectors. However, there are many situations where a more general 
deflation subspace is advisable. 

We derive an estimate for the speed of convergence of the deflated conjugate gradient method us- 
ing theory originally developed for algebraic multigrid methods. Our result holds for general deflation 
subspaces and is based on the weak approximation property — known from multigrid methods — and 
1 a measure of the A invariance of the subspace by the strengthened Cauchy-Schwarz inequality. In ad- 

dition the result suggests that the techniques developed to construct efficient interpolation operators 
in algebraic multigrid methods can also be applied to improve deflation subspaces. 
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1. Preliminaries. Consider solving the linear system of equations 

Ax = 6, (1.1) 

where A g K nxn (K = R or K — C) is self-adjoint and positive definite and x, b £ K™. 
In this paper we are interested in the case where the matrix A is large and sparse. 
The conjugate gradient (eg) method [TTJ[T7] is an iterative method which is often well 
suited to solve these systems. The speed of convergence of the eg method depends 
on the condition number k of the matrix A, more precisely on the distribution of its 
eigenvalues [17l [22] . When the condition number k is large it can become necessary 
to precondition the linear system such that a satisfactory speed of convergence is 
obtained. 

One possibility to precondition the eg method is via deflation as introduced by 
Nicolaides [TS] and Dostal [B] , see also [HJ EH HI HE] ■ We mention in particular the 
paper [18] , which derives a variant of Nicolaides' deflated eg that is mathematically 
equivalent but is algorithmically much closer to the standard eg algorithm. The basic 
idea of deflation is to "hide" certain parts of the spectrum of the matrix A from 
the eg method itself, such that the eg iteration "sees" a system that has a much 
smaller condition number than A. The part of the spectrum that is hidden from eg 
is determined by the so called deflation subspace S C K n . 

In [15] the space S is constructed as follows. The variables are combined into 
aggregates 21, C {1, 2, ... , n}, i = 1, . . . , m such that 



(J2ti = {l,2,...,n} and 2t i n2l J - = for i ^ . 



i=i 



Then S is spanned by the vectors v^ 1 ' , i = 1, . . . , m with 



1 if i G Vj 
otherwise 
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This procedure can be motivated in the following way: Assume that the matrix 
A arises from the discretization of a partial differential equation by a finite ele- 
ment/difference/volume method. Components of vectors then correspond to grid 
points of the underlying discretization scheme. When the aggregates 2li are chosen 
appropriately, S is close to the space consisting of those vectors whose values vary 
only slowly between neighboring grid points. Those vectors represent in many appli- 
cations the eigenvectors corresponding to small eigenvalues. By deflating such vectors 
the corresponding eigenvalues are removed from the spectrum and the deflated eg 
method behaves as if the matrix had a much smaller condition number. Interestingly, 
this procedure from [15] is completely analogous to the construction of prolongation 
operators in (non-smoothed) aggregation based multigrid methods [I] . 

Another viable and widely used approach for deflation, consists of spanning S 
directly by the eigenvectors corresponding to the smallest eigenvalues [IS]- This im- 
mediately leads to the removal of the smallest eigenvalues from the spectrum of A. 
The major drawback of this approach is that it often does not scale when the size of 
the system increases, because in many cases the number of eigenvalues below a given 
threshold grows with the size of the system. Thus, as the system size increases, more 
and more eigenvectors need to be computed to keep the convergence rate at a desired 
level. However, in the case where only a few extremal, i.e., very small eigenvalues 
exist, independent of the system size, however, this approach works reasonably well. 

Recently a combination of the two approaches has been rediscovered in the context 
of simulations in Quantum Chromodynamics 114] . Similarly to |15) , aggregates 2tj are 
introduced, but since eigenvectors belonging to small eigenvalues do not necessarily 
have slowly varying components in this application, a few eigenvectors w\, . . . ,we 
corresponding to the smallest eigenvalues of the system are computed. Then for 
every vector Wj and every aggregate 21^ the orthogonal projection of wj onto 

Vi = span{e^ : j G 2lj}, where with ef' = 8jj is the j th canonical vector, 

(i) 

is computed and the deflation subspace S is spanned by Wj for i = 1, . . . , m and 
j = 1, ...,£. This approach has the advantage that it often scales when the size of the 
system is increased while the number of eigenvectors I and the size of the aggregates 
are chosen to be constant. This particular strategy to define the deflation subspace 
resembles the setup of adaptive aggregation based algebraic multigrid methods [4], 
where the prolongation operator is constructed in a similar way. 

Motivated by these similarities to multigrid methods we investigate in this paper 
more closely the relation between the ranges of good multigrid prolongation operators 
and good deflation subspaces for the eg method. In doing so we analyze the conver- 
gence of deflated eg with techniques known from algebraic multigrid methods, see 
P2 [TBI [20] . The theory of algebraic multigrid measures the quality of a prolongation 
operator by a weak approximation property. We show that the speed of convergence 
of deflated eg can be estimated using the weak approximation property, showing that 
prolongation operators that work well in the multigrid setting also yield good results 
when used to span the deflation subspace. Furthermore, with this choice of the de- 
flation subspace, deflated eg exhibits similar scaling behavior as multigrid methods 
for many applications. This means that the number of iterations stays constant when 
we increase the system size due to finer discretizations, and, as opposed to standard 
(algebraic) multigrid, deflated eg does not require a smoother. 

We finish this introduction explaining some basic notation. Assume that x £ K" 
is an approximation to x, the solution of Then the residual r € K™ is given by 
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r = b — Ax, the error by e = x — e. Note that Ae = r. 

Let (x, y) — y*x be the euclidean inner product. Since A is self-adjoint and 
positive definite we can define the A-inner product and the A-norm by 



{x,v)a : = ( Ax >y) and \\ x \\a := V ( x > x )a ■ 

We denote by [t] the set of polynomials in the variable t with degree less than 
or equal to k. Let S C K™ be a subspace, then S 1 - is its orthogonal complement with 
respect to the 2-inner product and S ±A is its orthogonal complement with respect to 
the A-inner product. By ir(S) £ K nxn we denote the orthogonal projection onto S 
for the 2-inner product and by tta(S) its counter part for the A-inner product. The 
distance between a point x £ 1™ and a subspace S C K" is given by 

dist(<5>, x) = min ||a; — y\\ . 

The rest of the paper is structured as follows. Section [5] gives a short introduction 
into deflation methods. In Section [3] we analyze the convergence of deflation meth- 
ods by analyzing the condition number of the matrix A(I — tta(S)) £ K nx " which 
we estimate by using some results from multigrid theory. In Section |4] we further 
show that our general convergence result yields the known bounds derived for the 
case, where eigenvectors are directly used to span the deflation space. Moreover we 
demonstrate how prolongation operators from the classical algebraic multigrid theory 
for M-matrices can be used to obtain deflation subspaces. Then, we present some 
numerical experiments confirming the theory in Section [5] Finally in Section [5] we 
discuss how accurately the the action of the operator tta(S) has to be computed to 
achive overall convergence of the method. 

2. Review of Deflated CG. The m th Krylov subspace JC m (A, v) corresponding 
to a matrix A £ K™ x ™ and a vector v £ 1™ is given by 

K m {A, v) := span{w, Av, A 2 v, . . . , A m ~ 1 v} = {P(A)v : P £ K m _ x [t]} . 

The eg method generates the iterates X\,X2,x^, . . . £ l n for a given initial guess 
x el™, where 

Xi = x + Si and e f £ /Q(A, r ) 
such that the error e, = x — x\ fulfills 

||ei|U = ||x - Xi\\A = min ||a; - x\\a ■ 

Note that since e, = x — = x — (xq + e,) = e — e.j, in particular e = x — Xq, and 
putting e = x — x this is equivalent to 

IMU = || eo - ei\\ A = min ||e - e|| A . 

By definition of the Krylov subspace, and since gj £ ICi, we can write 

h = P{A)r = P(A)Ae 

with P £ Kj_i[t] and thus 

ei = e - gj = e - P(A)yleo = P(A)e where P = l-tP. (2.1) 
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A polynomial P can be written in the form (|2.1j) if and only if P g K;[i] and P(0) = 1. 
Hence 



||ej|| A = min \\P(A)e \\ A . (2.2) 

PeKi[t] 

P(0) = 1 

Recall that the matrix A is self-adjoint positive definite so that there exists a uni- 
tary matrix Q 6 K nxn and a corresponding diagonal matrix A = diag(Ai, . . ., A n ) € 
K™ x " with Ai > A 2 > • • ■ > A„ > yielding the eigen-decomposition A = QAQ*. The 
columns gi, . . . , g„ e K n of Q form an orthonormal basis of eigenvectors corresponding 
to the eigenvalues Ai, . . . , A„. 

Writing the initial error eo as eo = Qt; = 2j=i an d using P(A)qj = P(Xj)qj 
equation (|2.2[) yields 

n n 

IN& = JT< jE ^-P(A)eo||i = min . V &l 2 l P (Ai)| 2 Ai. ( 2 - 3 ) 

P(0)=1 J = 1 P(0)=1 J=1 

Thus the eg method implicitly constructs for each iterate ir, a polynomial Pi of degree 
at most i which interpolates the point (0,1) and approximates the points (Ai,0) 
by minimizing the sum in (|2.3p where the values Pj(Aj) are weighted by Ay. 
Clearly, the minimum in (I2.3[) will be smaller, and the convergence of the eg iteration 
will thus be faster, if the weights are small, and — given the constraint P(0) = 1 — 
particularly so if the weights corresponding to small eigenvalues are very small. With 
this interpretation of eg convergence, deflation can be regarded as a technique to make 
the weights small by making small for large j, i.e., small eigenvalues. 

Assume that we are given a subspace S C IK™ which contains (approximated) 
eigenvectors corresponding to the smallest eigenvalues of A. Since deflation requires 
a first preparatory step before starting the eg iteration, we from now on denote the 
initial guess by i_i with initial error e_i = x — X-\. We want to compute a new 
vector xq with error eo such that the part of eo belonging to S is "removed" . Setting 
.To = X-i + eo and eo = e_i — eo we thus want eo to be a projection of e_i onto S. 
Taking the A-orthogonal projection, i.e., eo ~ 7r J 4(5)e_i is particularly adequate: Let 
the columns of V G K nxm be an arbitrary basis of S, i.e., range(F) = S. Then 

tt a (S) = v^Avy^A. 

Since 

So = 7r A (<S)e_i = V(V*AV)~ 1 V*Ae-i = V(V*AV) -1 V*r_i 

we can compute eo and thus xq without explicit knowledge of e_i. Moreover, because 
of 

n 

||e_i-eo||^ = ||e ||^=X;|6| 2 A i 
i=i 

the choice eo = 7r^(<S)e_x precisely minimizes the sum of all weights in (|2.3I) . 
Computationally, since 

e_i = 71-^ (S)e_i + (I- ir A (S))e-i = e + e 
4 



and x = X-x + e_ l5 we obtain the solution x of (jl.lj) once we have computed (/ — 
7r j 4(5))e_i, the A-orthogonal projection of e^x onto S ±A . Modifying the eg method 
to restrict the search directions to the subspace S ±A and thus minimizing the A-norm 
of the error over the subspace S ±A yields the deflated eg algorithm from [18] which 
computes the desired projection, described in Algorithm [T] 

Algorithm 1 Deflated Conjugate Gradients Method 
choose K" 
r_i <— b — Ax-i 
i <- V(y*AV)- x V*r-x 
x <- x-x + eo 
ro b — Axo 

Po^r - V(V*AV)- l V*Ar 
for i <— 0, 1, . . . , n — 1 do 



X i+1 <- Xi + CXiPi 

n+x <-n- aiApi 

a , (r i+ i,r i+ i) 

p l+ x <- r l+1 - V{V*AV)- x V*Ar i+l + p lPl 
end for 



There is another, mathematically equivalent, formulation of the method which 
lends itself more easily for an analysis. To derive the method, we summarize some 
important properties in the following lemma. 

Lemma 2.1. Consider the singular linear system 

A(I-Tr A (S))x = (I-Tr A (S)yb, (2.4) 

which we call the deflated (linear) system. We have the following properties: 

1. The following equalities holds 

A(I - n A (V)) = (I- n A (V))*A = (I - ir A (V))*A(I - n A (V)) . (2.5) 

2. The matrix A(I — tt a (S)) is self-adjoint and positive semi-definite. 

3. The system is consistent, i.e., the right hand side {I — n A (S))*b is in the 
range of A(I—tt a (S)). This implies that the system has at least one solution. 

4-. If x is a solution of (|2.4I) then 

(I - TT A {S))X = (I - 7T A (S))X , 

where x is the solution of the linear system Ax = b. 
Proof. Let the columns of the matrix V € ]g™x m form an arbitrary basis of S. 
Using the relation 

A{I - n A (V)) = A— AV(V*AV) -1 VM = (/ - tt a (V))*A 

and the fact that (I — tt a (V)) is a projection yields 

A(I - n A (V)) = A(I - tt a (V))(I - tt a (V)) = (J - 7T A (V)yA(I - n A (V)) . 

This proves the first assertion and also shows that A(I — ir A (V)) is self-adjoint. Due 
to (|2.5j) we have 

(A(I - 7T A (V))X, X) = ((I - 7T A (V))X, (I - 7T A (V))X) A = \\ (I - 7T A (V))x\\ A > 
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which proves the second assertion. 

Again due to (12.51) and the fact that A has full rank we have 



range 



A(I-ir A (S)) 



range 



(J-7r A (5))M 



range 



Hence the system is consistent which proves the third assertion. Using (|2.4p and (|2.5 
yields 

A(I - ir A (S))x = (7 - ir A (S)yb =(I- n A {S))*Ax = A(I - ir A (S))x . 
Multiplying with A -1 from the left we get 



Since 



(I-tt a (S))x=(I-tt a (S))x. 



tt a {S)x = V(V*AV)- 1 V*Ax = V^AVy^b 



we can compute ir A (S)x without explicit knowledge of x. Thus due to Lemma 12.11 
if we have a solution x for the deflated system (|2.4[) we obtain the solution for the 
original system by 

x = (I — ir A (S))x + ir A (S)x 
= (I- tt a (S))x + V{V*AV)- l V*b . 



We now want to solve the deflated system (12. 4p . Since the matrix A is positive semi- 
definite we can apply the eg method. The lack of regularity is no impediment to the 
standard eg iteration (cf. [12]) as long as (|2.4|) is consistent, which was shown to be 
the case in Lemma |2~T1 

Assume that xq,X\, . . . are the iterates and tq, f±, . . . are the corresponding resid- 
uals of the eg method applied to the deflated linear system (12.4[) . If Algorithm Q] is 
initialized with xq = £q we have the relation (see |10j ) 



= {I- n A (S))xi + V^AVy^b 



and 



(2.6) 



Thus Algorithm [T] is mathematically equivalent to solving (|2.4I) via the eg method 
and then computing the approximation for the solution by (12.61) . 

Hence, for the purpose of an analysis we can think of deflated eg as applying the 
standard eg algorithm with the matrix A(I — tt a (S)) to solve (|2.4p . In a practical 
implementation Algorithm [T] is to be preferred, since numerically the action of (/ — 
tt a (S)) is not computed exactly and we are thus facing an inconsistent system which 
may introduce instabilities [12] . There are various other formulations of the deflated 
eg method that are mathematically equivalent (for an overview see [10]) for which our 
analysis will hold as well. 

Let /ii > • • • > fi n > be the eigenvalues of the self-adjoint and positive semi- 
definite matrix A(I — n A (S)). Let I g N denote the largest index such that fj,£ ^ 0. 
The errors of the eg iterates then satisfy 



MU < 2 



|e |U fori = 0,1,2,. 



where K c g = ^ , see [5] HE] ■ We call n e s the effective condition number of the deflated 
matrix A(I — it A (S)) to distinguish it from the condition number k of the original 
matrix A. Thus in order to analyze the convergence of deflated eg it suffices to 
estimate the largest and smallest non-zero eigenvalue of the matrix A(I — tt a (S)). 



3. Convergence Analysis. In this section we give an estimate for the speed of 
convergence of the deflated eg method by estimating the effective condition number 
K e ff of the matrix A(I — tta(S)). 

3.1. Eigenvalue Bounds. In order to estimate the speed of convergence of the 
deflated eg method we need estimates for the largest and smallest non-zero eigenvalue 
of A(I — tt a (S)). The largest eigenvalue of the matrix A(I — ir A (S)) is the maximum 
of the Rayleigh quotient (cf. [23]), i.e., 

(A(I-n A (S))x,x) 

Hi = max 



x£K"\{0} (x,x) 

In addition, from the fact that Air A (S) — AV (V* AV)~ 1 V* A is positive semi-definite 
and thus 

(A(I - ir A (S))x, x) = (Ax, x) - (Air A (S)x, x) < (Ax, x) , 

we obtain 

/ii < max i^A. = Al = |U|| . (3.1) 

i£K"\{0} (x,X) 

This gives a simple upper bound for the largest eigenvalue of A(I — n A (S)). 

The following auxiliary result will be used to derive a lower bound for the smallest 
non-zero eigenvalue. It can be regarded as a special case of the min-max (or Courant- 
Fischer-Weyl) theorem, cf. [23 . 

Lemma 3.1. Let M e K" x ™ be self-adjoint, M = UDU* with U € K IIX ™ unitary 
and D — diag(//i, . . . , fj, n ) with fJ>i > ■ ■ ■ > fJ- n > 0. Then for k — 1, . . . , n 

(Mi, x) 

u k = mm — — . 

xGK™\{0} (x,x) 
a;_Lspan{uk + i , . ...u n } 

Proof. Let Uj denote the columns of U which form an orthonormal basis of 
eigenvectors of M. Let x _L span{itfc + i, . . . , u n }. Then x = Ylj=i ^j u j an d we have 

(Mx,x) = (UDU*x,x) = (DU*x,U*x) = 

with£ = (Ci,...,6,0,...,0) T . Thus 

k 



3 = 1 

k 



which yields 

(Mx, x) > (£, = Mfe (U*U£, = (U£, U£) = Hk (x, x) . 
The assertion now follows since 

(Mu k , u k ) = (n k Uk, u k ) = i± k (u k , u k ) 
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and u k _L span{u fc+1 , . . . ,u n }. □ 

Lemma 13.11 characterizes the smallest non-zero eigenvalue fig of A(I — tt a (S)) 
using span{it£ + i, . . . ,«„}, the kernel of A(I — n A (S)). Since the matrix A has full 
rank, the kernel of A(I — tt a (S)) is the kernel of (I — tt a (S)) which is the deflation 
subspace S. Thus 

(A(I-n A (S))x,x) 
M = mm r , 

xes^UO} \x,x) 



and due to (|2.5p 

((I-ir A (S))*A(I-ir A (S))x,x) 
fi e = mm 

i€S i \{o} (a;, a;) 

min (A(I-Tr A (S))x,(I-Tr A (S))x) 

xes->-\{0} (x,x) 

= min Ml^^P*. (3 .2) 

xes^\{o} \\x\\l 

We are now ready to employ techniques developed for the analysis of algebraic 
multigrid methods to further advance our analysis. 

3.2. Weak Approximation Property. In order to estimate the smallest non- 
zero eigenvalue [it of A(I— it A (S)) we introduce some basic ideas of algebraic multigrid 
convergence analysis. 

Algebraic multigrid methods [T6j [20] are based on the assumption that the error 
of a given iterate can be split into highly oscillatory and slowly varying components 
of the error. The so-called smoother reduces the highly oscillatory components while 
the coarse grid correction reduces the slowly varying ones. In many applications 
the highly oscillatory components are spanned by the eigenvectors corresponding to 
large eigenvalues while the slowly varying components are spanned by the eigenvalues 
corresponding to the small eigenvalues of the matrix. In order to quantify those 
properties, classical algebraic multigrid theory (cf. [2]) measures how well the coarse 
grid correction, denned by the prolongation operator, is able to reduce the slowly 
varying error components. In order to measure the quality of interpolation operators 
we define the weak approximation property as follows. 

Definition 3.2. A subspace S C K™ fulfills the weak approximation property 
with constant K > if 

dist(S,x)j< -^-\\x\\ A forallxeK n . (3.3) 
ll^ll 



If the diagonal entries an of A fulfill an = 1 then Definition 13 . 2 1 coincides with the 
definition from (2] |3l [TBI HO] for a weak approximation property. It is called "weak" 
because it is only sufficient for a two-level convergence theory [TBI Section 4.5] instead 
of multi-level one. Unit diagonal entries may be achieved by using A — D~?AD~2 
instead of A which would be reflected in a change of the constant K. 

Assume that V € K nxm is a multigrid prolongation operator. Then often a 
restriction operator R £ K mx ™ and a constant C £ Hi can be determined, such that 

\\x-VRx\\ 2 2 <C\\x\\\. 
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is derived where C is independent of the size of the matrix (cf. [2J [31 HH [20] )■ The 
following lemma shows that this implies that the weak approximation property is 
fulfilled for K = C\\A\\. 

Lemma 3.3. LetV eK nxm , Re K mxn and assume that 

\\x-VRx\\l<C\\x\\ 2 A forallx£K n 

for a constant C > holds. Then the subspace S = range V fulfills the weak approxi- 
mation property with constant K — C \\A\\ . 
Proof. We have for an arbitrary x e K n 

dist(<S, x)2 = min \\x — = min \\x — Vz\\\ 
yes zeK m 
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<\\x-VRx\\%<C\\x\\\=^-\\x^ A 



Strictly speaking, any subspace S fulfills a weak approximation property, just by 
choosing K large enough, since 

dist(5,.x)2 < \\x\\ 2 2 < X n \\x\\ 2 A for all x E K n . 

The interest of Definition 13.21 is in cases where the subspace S is such that that 
K is small and in situations where K is constant for a whole family of matrices 
A and subspaces S, the family of matrices A representing, e.g., different levels of 
discretization of a continuous operator. 

The following theorem now gives an estimate for the smallest eigenvalue of the 
matrix A(I — 7Ta(S)) in terms of the constant K of the weak approximation property. 

Theorem 3.4. Let S C K n be a subspace such that the weak approximation 
property (|3.3[) is fulfilled with constant K . Then the effective condition number K e s = 

of the matrix A(I — tta(S)) satisfies 



K e g < — where t :— mm — G (0. 1 . (3.4) 

C xes±\{o} \\x\\ 2 A 



Proof. Denote by 7r(«S) the orthogonal projection onto S with respect to the 
2-inner product. Then 

||z - 7r(5)ar||l = mm || z - y\\* = dist(5, x)\ . (3.5) 
yes 

For x € 5 we have ir(S)x — and thus due to (|3.5[) 

||a;||i = ||a;-7r(5))a;||l = di8t(5,a;)l 

which, using p.3p gives 

Nil < pr INI A hrxeS ± . 
By applying this estimate to the denominator in Q3.2[) we obtain 



(3.6) 



Hence, by (glj and (131))) . 



Ml . mil # 
Kcff = — - Tail: = T ■ 

It remains to show that £ G (0, 1]. Since ita(S)x = x if and only if x G <S, we have 
||x — tta(S)x\\\ > for x € S \ {0}. Thus £ > 0. The A-orthogonal projection 
minimizes the distance in the A- norm, i.e., 

||x - TT A {S)xf A = mm ||x - y\\ 2 A . 

y&S 

Hence ||a; — 7Tyi(<S)a;||^ < ||x||^, which proves £ < 1. □ 

Now that we have derived a connection between the effective condition number 
and the weak approximation property used in the analysis of algebraic multigrid 
methods, it remains to interpret the quantity £ which is what we do in the next 
section. 

3.3. Strengthened Cauchy-Schwarz Inequality. We first introduce the con- 
cept of abstract angles between subspaces which are defined by the strengthened 
Cauchy-Schwarz inequality. 

For two subspaces Hi,H 2 C K™ with H\ fl H 2 = {0} there exists a constant 
7 G [0, 1) such that 



I (u, v) I < 1 y/(u,u)y/M Vu e JTl, Vv G H 2 (3.7) 

[7J Theorem 2.1]. Equation (|3.7[) is called strengthened Cauchy-Schwarz inequality 
and 7 can be interpreted as the abstract angle between Hi and H 2 • 
Inequality (|3.7[) implies that for u G Hi, v G H% we have 

I (u, v) I < 7 (u, w) 2 (v,v) 2 < % [{u, u) + (v, v)\ , 
since for any two numbers a, & we have \ab\ < i(|a| 2 + |6| 2 ). Hence 

(1 - 7) [(u, u) + (v, v)] < [(u, u) + (v, v)} - 7 [(«, it) + (v, v}] 

< [(u,u) + (v,v)]-2\(u,v}\ 

< [{u, u) + (v, v)] - 2 Re (u, v) 
= (u + v, u + v) . 

Taking the infimum over all v G H 2 yields 

(1-7)IM| 2 < inf \\u + v\\ 2 Vug Hi. (3.8) 

v£H 2 

We now apply this general result in the case where Hi = S , H 2 = S and (•,•) is 
the A- inner product, like in [T] and [5]. The A-orthogonal projection tv a (S)u yields 
the vector in S which is closest to u in the A-norm. Thus the infimum in (I3.8[) is 
obtained for v = —tta(S)u and therefore 

(i-i)\\u\\ A <\\u-ir A (S)u\\ A VueS x . 

This yields a bound for £ as 

C= min fc|^*>(l- 7 ). (3.9) 

zGS^\{0} llxl^ 
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Using (I3.9[) we can show the following lemma which states that we can interpret £ as 
a measure of approximate A-invariance of the subspace 5, i.e., a small value of 7, and 
thus large value for £, indicates that AS is close to S. 

Lemma 3.5. If S is A-invariant, i.e., AS = S , then 7 = and thus £ = 1. 

Proof. Since the subspace S is A-invariant, we have Av € S and thus 

(u,v) A = {u,Av) =0 Vu e 5 1 , V« e 5. 

This gives 7 = and thus £ = 1. 

We can now formulate our main result. 

Theorem 3.6. Let S CK n , S = range(U), V £ K nxm and let V fulfill the weak 
approximation property (|3.3p with constant K . Let S C K™ be the subspace spanned 
by the columns of V . Furthermore let 7 € [0, 1) be the smallest constant such that 

\(u,v) A \ <i{u,u)l(v,v)l VueS^^VveS. (3.10) 
Then the effective condition number «; e g = of the matrix A(I — 7Ta(V)) satisfies 

K 

Keff < yq r • (3.11) 



Proof. Equation (l3~TTj) follows from ([31} and (|3T9"]) , □ 

This theorem gives a bound on the effective condition number of the deflated 
matrix A(I — tta(S)) which depends solely on the weak approximation constant K 
and the measure £ on the A-invariance of the deflation subspace S. 

4. Applications. In this section we apply our theory developed so far to the 
classical case where the deflation subspace iS is spanned by the eigenvectors corre- 
sponding to the (n — k) smallest eigenvalues. In addition we consider the case where 
S is the range of a prolongation operator from the the classical algebraic multigrid 
method described in [5|fT6|l20]. 

4.1. The Case of Exact Eigenvalue Deflation. Let S be spanned by the 
eigenvectors corresponding to the (n — k) smallest eigenvalues, e.g., V = [qk+i \ ■ ■ ■ \q n ], 
where k € N. It has been shown for this case (cf. O Section 1]) that n c g = j*-. To 
demonstrate the quality of the bound r[~y) from Theorem 13.61 we now show that in 
this case we actually have tj^j — K cS, i-e., the bound is best possible. 

We hrst consider £. Since the subspace S is A-invariant, we have £ = 1 due to 
Lemma 13.51 

We now compute the smallest value for K, such that the weak approximation 
property (|3.3I) is fulfilled. If 7r(5) is the orthogonal projection onto S (in the 2-inncr 
product) then 

\\x — ir(S)x\\2 = min ||x — y\\ 2 — dist(<S, x)\ . 

yes 

We expand x in terms of the orthonormal eigenvectors qi of A, 

n 

x = 5^£i% ■ 

i=l 
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Then the orthogonal projection ir(V)x of x onto S fulfills ir(V)x = Y^,7=k+i ana - 
thus 

k n k 

dist(S,x) 2 2 = \\x-ir(V)xg = \\J2Zi<li + £ (6-6)9illa = Z)lC| a - 

2—1 2 — fe+1 2 — 1 

This yields 

Hi = E i^i 2 ^ ^ E &i 2A < ^ Afc E i^i 2 = Xk &st(s,x)t . 

i—1 i—1 i—1 

Hence the weak approximation property (|3.3[) holds with K = = jj- . 

Putting things together we see that the bound from Theorem 13.61 is (jz^y = j^, 
i.e., the bound is equal to the effective condition number n e ff and thus best possible. 

4.2. Deflation Subspaces for M -Matrices. The classical algebraic multigrid 
method [16j [20] was specifically designed for the case that A £ R nxn is an M- 
matrix, i.e., A is symmetric positive definite and < for i ^ j. In this section we 
show that we can use the prolongation operators constructed in the classical algebraic 
multigrid method as the operator V which spans the deflation subspace 5 and derive a 
priori bounds on the effective condition number following the analysis done in |16[ 120] . 

The construction of prolongation operators is done by inspecting the graph G(A) 
of a matrix A € K" xn which is given by 

G(A) = (W,E) where W = {1, 2, . . . , n} 

E = {(i,j)€WxW:a i:i ^0,i^j}. 

The neighborhood of a node i € W is given by 

Ni-.= {je W:(i,j)eE}. 

To construct the prolongation operator, or equivalently the deflation subspaces, we 
split the variables W into coarse and fine variables C and J 7 , such that W = CUJ 7 
and Ni n C ^ for i e J. The coarse variables have a direct representation on the 
coarse grid, or equivalently the deflation subspace, while the fine variables interpolate 
from the coarse ones. For simplicity of notation assume that the variables in C have 
a smaller index than those in F, i.e., C = {1,2,..., m}, T — {m + 1, m + 2, . . . , n}. 
For every fine variable i € T choose a set of variables Pi C Ni DC. The value for 
the variable i is then interpolated from the variables in Pi . Defining the interpolation 
weights 

— 0-ik .,, J2k£Ni a ik 

Wik = cti with ai = — tor i E J- 

Qii 2-~/k£P ik 

yields the prolongation operator V € J£ nxm by 

iye% = {% c [ 0ri J C . (4.1) 

[EfeeA^ifc^ forte J 7 

Under the assumption that the C / .F-splitting is reasonably well chosen the classi- 
cal multigrid theory yields an estimate for the constant K of the weak approximation 
property 
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Theorem 4.1. Let A G R™ xn be a symmetric weakly diagonally dominant M- 
matrix, i.e., < for i ^ j and ; ay > /or a/Z i. If for fixed t > 1 a 
C /T -splitting exists such that for each i G T there is a P, G C f~l wziZi 

^ |a ife | > - M fori&F (4.2) 

feGPi jeNi 

then there exists a matrix R G ]R mxn such that the operator V from (|4.ip fulfills 

\\e-VRe\\ 2 D <r\\e\\ 2 A 

where D = diagA is the diagonal matrix containing the diagonal entries an of A. 
Proof. Define the restriction R G R mx ™ as 



R = 



where I m G R mxm is the identity matrix. Thus R maps a coarse variables to itself 
and a fine variables to zero. Then the result follows from [20] Theorem A. 4. 3]. □ 

Corollary 4.2. Under the same assumptions as in Theorem \4-l\ we have that 
V fulfills the weak approximation property (|3.3|) with 



Proof. Directly follows from Lemma 13 . 31 the fact that (min, a% 



i < Hi 



for 



ieR™ and Theorem EO □ 

Example 4.3. Let N G N be odd. Consider the discrete 9-point Laplacian, i.e., 
the block tridiagonal matrix 



A = 



B C 
C B 



'■■ C 

C B 



r,N 2 xN 2 



with B,C G 



tNxN 



B 



-1 



-1 



and C 



-1 -1 



The graph G(A) of A is a regular N x N grid with added diagonal connections, see 
figure \jA\ We set C as the variables in odd rows and columns (O), J- as the remaining 
variables (o, • and-k). 

The requirement (|4.2[) is equivalent to t > en for i £ J. A straight forward 
computation yields on = 2 for i G Qfj = 4 for i G o and on = | for t 6 *. Hence 
t = 4 fulfills (|4.2j) . Due to Gershgorin's theorem \23f the eigenvalues Xi of A fulfill 
Xi G [0, 16] and thus \\A\\ < 16 and due to (14.21) the weak approximation property is 
fulfilled for K = 8. 
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Figure 4.1. 


The graph of A split into C and T . 




p 


Iterations 


Residual 


Error 




4 


8 


5.64429 • 10" 7 


1.53486 • 10 


-7 


5 


8 


8.4304 • 1CT 7 


4.64197-10 


-7 


6 


9 


5.27683 • 1CT 7 


1.63928-10 


-6 


7 


9 


6.11646- 1CT 7 


5.74174-10 


-6 


8 


9 


6.36346 • 1CT 7 


2.56345 • 10 


-5 


9 


9 


6.57814- 10- 7 


6.60374 • 10 


-5 



Table 5.1 
Number of iterations where N = 2 P — 1. 



5. Numerical Experiments. To carry out our numerical experiments we con- 
sider the matrix V from (|4.1[) with A, C and T from example 14.31 Thus the deflation 
subspace S is range V. 



5.1. Independence of the Grid Size. In Example 14.31 we have seen that the 
constant K is independent of the grid size N . Thus we expect the method to converge 
in a constant number of iterations if the abstract angle 7 is independent of N, Hence 
we measure the number of iterations for different sizes N — TP — 1 of the linear system. 

We choose a random right hand side b such that the solution x fulfills ||a;| = 1. 
Then we run the deflated eg method [18] until the residual of the i th iterate satisfies 
||r, || < 10~ 6 . The number of iterations is listed in table [5~T1 where we observe that 



the number of iterations stays constant. 

5.2. Numerical Computation of the Constants. In this subsection we want 
to verify our theory by numerically computing the condition numbers k, K e ff the 

constants K and 7 and the estimate for the effective condition number for a 

' 1 — y 

small linear system. This computation is only possible for small linear systems since 
it involves the computation of eigenvectors of dense matrices which are of the same 
size as the linear system. 

We are interested in the smallest K such that the weak approximation property 
with constant K holds. For small linear systems such K can be computed numerically 
as follows: From Definition 13.21 it follows that K is given by 

n in dist(«S, x)n 

K=\\A\\ sup v ' . 

iei"\{o} IRU 

Let the columns oiW(£ K nxm form an orthonormal basis of S ± . Then the orthogonal 
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A 


A(I-ir A (V)) 




Ai 


0.0577 


6.0708 


Mi 




11.9616 


11.9586 


Hi 


K 


207.3403 


1.9698 








1.9703 


K 






0.3369 


7 






2.9715 


K 

1-7 



Table 5.2 

Condition number of the linear systems for N = 2 s — 1. 



projection (7 - tt(5)) onto S 1 - fulfills (J - 7r(«S)) = WW* and thus due to (O 

^ m.h — -7r(5))a:||i ..... ||WW*sb||! 

K=\\A\\ sup Jil " 112 = ||.4|| sup " 2 IL . (5.1) 

:ceK"\{o} IFIU iei ,i \{o} \\ x \\a 

Using the eigendecomposition A = QAQ* yields 

\\x\\ 2 A = (Ax,x) = (QAQ*x,x) = (a?Q*x,A?Q*x\ = \\A^Q*x\\ 2 2 . (5.2) 

Since we are interested in the supremum over all vectors in K n \ {0} we can substitute 
x by QA~iz in (|5.ip . This yields due to (|53|) that 

K=\\A\\ sup l|W ! | r^ lli =||A||||W-QA-^f. 

zGK"\{0} Il z ll2 

The matrix norms on the right hand side can numerically be computed via the sin- 
gular value decomposition. The constant 7 from the strengthened Cauchy-Schwarz 
inequality is computed by the method from |13j which is also based on the singular 
value decomposition. 

The results for N — 2 5 — 1 are given in Table 15.21 We see that is a good 
estimate for the condition number n e g of the deflated matrix in this example. 

6. Influence of the Accuracy of Computations. The deflated eg method 
involves the solution of the "inner" linear system 

{V*AV)z l+1 = V*Ar l+1 (6.1) 

in every step of the iteration. If the dimension m of the deflated subspace, i.e., the 
number of columns of V is small, we can solve (|6.1j) exactly up to numerical errors, 
e.g., by using a factorization of V*AV. Often, however, m will be large (we had 
to = rt/4 in the numerical experiments of Section [5]), so that solving (|6.ip will be 
done using an "inner" iteration. Its accuracy will be decisive for the convergence 
process of the overall iteration. This can be motivated explained as follows. 

Recall that we can think of deflated eg as applying the eg method to the linear 
system 

A(I - V(V*AV)~ 1 V*A)x = (I - AV^AVy^-V^b . 

V v ' 

In here, the kernel of (/ — tta(S)) is S, the column range of V. Let us now assume 
that we are given an approximation M £ K mxm for the matrix (V* AV)' 1 . If we 
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then replace {V*AV)~ l by M in I - V(V*AV)~ 1 V*A, the operator is usually not 
a projection anymore, and S is not its kernel. That is, in general the matrix of the 
approximately deflated system 

A(I - VMV*A) 

is non-singular. The approximately deflated system will thus loose the property of 
having a considerable number of zero eigenvalues, an essential ingredient to estimate 
the effective condition number K c g in Theorem 13.61 Even worse, the eigenvalues 
that would be mapped to zero by the exact ^4-orthogonal projection now remain as 
small non-zero eigenvalues making the condition number of the approximately deflated 
system potentially larger than that of the original system. 

Initially we split the error into e_i = eo + eo, where eo € S A and eo G S. If this 
is done inexactly, eo will not be in S ±A , but its A-orthogonal projection on S will be 
small, i.e., we have 

e = eg" + e c lr with eg* G S ±A , e c " G S and ||e§"|| < p , 

for some tolerance t > 0. The eigenvectors corresponding to zero eigenvalues of the 
exactly deflated matrix A(I — ita(S)) span the subspace S. Thus, in the case where 
we work with the matrix A(I — V MV* A), its eigenvectors ug corresponding to very 
small eigenvalues /j,£ will be close to S, i.e., 

Ui = uf + uf 1 with uf G S, G S ±A and ||uf r || < p . 

Hence those components in the expansion of eo in the basis of eigenvectors of A(I — 
VMV*A) corresponding to small eigenvalues are very small. Together with ()2.3p and 
the subsequent discussion this explains that the eg method does not "see" these error 
components as long as they are substantially smaller than the other error components, 
resulting in an initial phase of fast convergence. However, when the norm of the 
current error approaches p, the error components belonging to the small eigenvalues 
will not be negligible anymore, and the eg iteration slows down dramatically. 

The question remains how to determine a suitable stopping criterion for the inner 
iteration based on the stopping criterion of the outer iteration 

INI <t||6|| =:e (6.2) 

for some < t < 1. More precisely, how do we have to choose r c for the inner 
stopping criterion 

Nl<r||6||=: e , 

to achieve (|6.2[) ? A first strategy may be to set t c = e but it turns out that we 
can relax this requirement. Since, our problem is equivalent to the question, how 
accurately we have to compute the matrix vector product A(I — ita(S))p for p G K™ 
in the outer eg iteration, we can use the results from [T9l [21] . There it is suggested 
to use 

t c = max < — ^— - , e > • c with < c < 1 
I Fill J 

in the i th outer iteration. That is the relative tolerance for the inner iteration can be 
relaxed while the outer iteration advances. 
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7. Conclusions. In this paper we derived a convergence estimate for the de- 
flated eg method based on algebraic multigrid theory. We have shown that our theory 
recovers the exact result for the convergence estimate for eigenvector deflation. By 
combining the deflation ansatz with the ideas of algebraic multigrid we not only gave 
a proof of convergence for deflation subspaces spanned by multigrid prolongation 
operators, but also pave the way for the theoretical analysis of more general defla- 
tion subspaces that are not necessarily A-invariant and do not need to be spanned 
by (approximate) eigenvectors. In this manner all the tools developed for the con- 
struction of efficient (algebraic) multigrid interpolation operators can be facilitated to 
construct improved deflation subspaces. Finally, the developed theory suggests that 
using the multigrid coarse-grid correction in a deflated conjugate gradient method 
yields a scalable — in the sense of constant number of iterations when increasing the 
resolution of the discretization — iterative method without the need of constructing a 
suitable smoothing iteration. This might be attractive in situations where such an 
iteration is hard to come by or not known all together. 
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